19 research outputs found

    Detecting the molecular scars of evolution in the Mycobacterium tuberculosis complex by analyzing interrupted coding sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Computer-assisted analyses have shown that all bacterial genomes contain a small percentage of open reading frames with a frameshift or in-frame stop codon We report here a comparative analysis of these interrupted coding sequences (ICDSs) in six isolates of <it>M. tuberculosis</it>, two of <it>M. bovis </it>and one of <it>M. africanum </it>and question their phenotypic impact and evolutionary significance.</p> <p>Results</p> <p>ICDSs were classified as "common to all strains" or "strain-specific". Common ICDSs are believed to result from mutations acquired before the divergence of the species, whereas strain-specific ICDSs were acquired after this divergence. Comparative analyses of these ICDSs therefore define the molecular signature of a particular strain, phylogenetic lineage or species, which may be useful for inferring phenotypic traits such as virulence and molecular relationships. For instance, <it>in silico </it>analysis of the W-Beijing lineage of <it>M. tuberculosis</it>, an emergent family involved in several outbreaks, is readily distinguishable from other phyla by its smaller number of common ICDSs, including at least one known to be associated with virulence. Our observation was confirmed through the sequencing analysis of ICDSs in a panel of 21 clinical <it>M. tuberculosis </it>strains. This analysis further illustrates the divergence of the W-Beijing lineage from other phyla in terms of the number of full-length ORFs not containing a frameshift. We further show that ICDS formation is not associated with the presence of a mutated promoter, and suggest that promoter extinction is not the main cause of pseudogene formation.</p> <p>Conclusion</p> <p>The correlation between ICDSs, function and phenotypes could have important evolutionary implications. This study provides population geneticists with a list of targets, which could undergo selective pressure and thus alters relationships between the various lineages of <it>M. tuberculosis </it>strains and their host. This approach could be applied to any closely related bacterial strains or species for which several genome sequences are available.</p

    ICDS database: interrupted CoDing sequences in prokaryotic genomes

    Get PDF
    Unrecognized frameshifts, in-frame stop codons and sequencing errors lead to Interrupted CoDing Sequence (ICDS) that can seriously affect all subsequent steps of functional characterization, from in silico analysis to high-throughput proteomic projects. Here, we describe the Interrupted CoDing Sequence database containing ICDS detected by a similarity-based approach in 80 complete prokaryotic genomes. ICDS can be retrieved by species browsing or similarity searches via a web interface (). The definition of each interrupted gene is provided as well as the ICDS genomic localization with the surrounding sequence. Furthermore, to facilitate the experimental characterization of ICDS, we propose optimized primers for re-sequencing purposes. The database will be regularly updated with additional data from ongoing sequenced genomes. Our strategy has been validated by three independent tests: (i) ICDS prediction on a benchmark of artificially created frameshifts, (ii) comparison of predicted ICDS and results obtained from the comparison of the two genomic sequences of Bacillus licheniformis strain ATCC 14580 and (iii) re-sequencing of 25 predicted ICDS of the recently sequenced genome of Mycobacterium smegmatis. This allows us to estimate the specificity and sensitivity (95 and 82%, respectively) of our program and the efficiency of primer determination

    Insights into metazoan evolution from Alvinella pompejana cDNAs.

    Get PDF
    International audienceBACKGROUND: Alvinella pompejana is a representative of Annelids, a key phylum for evo-devo studies that is still poorly studied at the sequence level. A. pompejana inhabits deep-sea hydrothermal vents and is currently known as one of the most thermotolerant Eukaryotes in marine environments, withstanding the largest known chemical and thermal ranges (from 5 to 105°C). This tube-dwelling worm forms dense colonies on the surface of hydrothermal chimneys and can withstand long periods of hypo/anoxia and long phases of exposure to hydrogen sulphides. A. pompejana specifically inhabits chimney walls of hydrothermal vents on the East Pacific Rise. To survive, Alvinella has developed numerous adaptations at the physiological and molecular levels, such as an increase in the thermostability of proteins and protein complexes. It represents an outstanding model organism for studying adaptation to harsh physicochemical conditions and for isolating stable macromolecules resistant to high temperatures. RESULTS: We have constructed four full length enriched cDNA libraries to investigate the biology and evolution of this intriguing animal. Analysis of more than 75,000 high quality reads led to the identification of 15,858 transcripts and 9,221 putative protein sequences. Our annotation reveals a good coverage of most animal pathways and networks with a prevalence of transcripts involved in oxidative stress resistance, detoxification, anti-bacterial defence, and heat shock protection. Alvinella proteins seem to show a slow evolutionary rate and a higher similarity with proteins from Vertebrates compared to proteins from Arthropods or Nematodes. Their composition shows enrichment in positively charged amino acids that might contribute to their thermostability. The gene content of Alvinella reveals that an important pool of genes previously considered to be specific to Deuterostomes were in fact already present in the last common ancestor of the Bilaterian animals, but have been secondarily lost in model invertebrates. This pool is enriched in glycoproteins that play a key role in intercellular communication, hormonal regulation and immunity. CONCLUSIONS: Our study starts to unravel the gene content and sequence evolution of a deep-sea annelid, revealing key features in eukaryote adaptation to extreme environmental conditions and highlighting the proximity of Annelids and Vertebrates

    A new protein linear motif benchmark for multiple sequence alignment software

    No full text
    Abstract Background Linear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins. They play important roles in post-translational modification, cell compartment targeting, docking sites for regulatory complex assembly and protein processing and cleavage. Methods for LM detection are now being developed that are strongly dependent on scores for motif conservation in homologous proteins. However, most LMs are found in natively disordered polypeptide segments that evolve rapidly, unhindered by structural constraints on the sequence. These regions of modular proteins are difficult to align using classical multiple sequence alignment programs that are specifically optimised to align the globular domains. As a consequence, poor motif alignment quality is hindering efforts to detect new LMs. Results We have developed a new benchmark, as part of the BAliBASE suite, designed to assess the ability of standard multiple alignment methods to detect and align LMs. The reference alignments are organised into different test sets representing real alignment problems and contain examples of experimentally verified functional motifs, extracted from the Eukaryotic Linear Motif (ELM) database. The benchmark has been used to evaluate and compare a number of multiple alignment programs. With distantly related proteins, the worst alignment program correctly aligns 48% of LMs compared to 73% for the best program. However, the performance of all the programs is adversely affected by the introduction of other sequences containing false positive motifs. The ranking of the alignment programs based on LM alignment quality is similar to that observed when considering full-length protein alignments, however little correlation was observed between LM and overall alignment quality for individual alignment test cases. Conclusion We have shown that none of the programs currently available is capable of reliably aligning LMs in distantly related sequences and we have highlighted a number of specific problems. The results of the tests suggest possible ways to improve program accuracy for difficult, divergent sequences.</p

    Constitutive expression of a complement-like protein in Toll and JAK gain-of-function mutants of Drosophila

    No full text
    We show that Drosophila expresses four genes encoding proteins with significant similarities with the thiolester-containing proteins of the complement C3/α(2)-macroglobulin superfamily. The genes are transcribed at a low level during all stages of development, and their expression is markedly up-regulated after an immune challenge. For one of these genes, which is predominantly expressed in the larval fat body, we observe a constitutive expression in gain-of-function mutants of the Janus kinase (JAK) hop and a reduced inducibility in loss-of-function hop mutants. We also observe a constitutive expression in gain-of-function Toll mutants. We discuss the possible roles of these novel complement-like proteins in the Drosophila host defense

    A new protein linear motif benchmark for multiple sequence alignment software-4

    No full text
    subset 1, showing the extreme observations (stars or circles), lower quartile, median, upper quartile, and largest observation in each similarity category. b) Execution times in seconds required to construct all the multiple alignments in Subset 1. Programs are displayed in the order of the Friedman test using the SPS scores for group V11 (additional file ), with the highest scoring program on the left.<p><b>Copyright information:</b></p><p>Taken from "A new protein linear motif benchmark for multiple sequence alignment software"</p><p>http://www.biomedcentral.com/1471-2105/9/213</p><p>BMC Bioinformatics 2008;9():213-213.</p><p>Published online 25 Apr 2008</p><p>PMCID:PMC2374782.</p><p></p

    A new protein linear motif benchmark for multiple sequence alignment software-2

    No full text
    different conditions, showing the extreme observations (stars or circles), lower quartile, median, upper quartile, and largest observation. Significant differences, according to a Wilcoxon signed ranks test (p < 0.05), are indicated by an asterix on the x-axis. P-values for the Wilcoxon tests are available in additional file , table 3. a) SPS scores for alignments of sequences with validated motifs only compared to alignments including sequences with errors. b) SPS scores for alignments of sequences with validated motifs only compared to alignments including sequences containing false positive (FP) motifs. c) SPS scores for alignments of sequences with validated motifs only compared to alignments including sequences that do not contain any examples of the motif.<p><b>Copyright information:</b></p><p>Taken from "A new protein linear motif benchmark for multiple sequence alignment software"</p><p>http://www.biomedcentral.com/1471-2105/9/213</p><p>BMC Bioinformatics 2008;9():213-213.</p><p>Published online 25 Apr 2008</p><p>PMCID:PMC2374782.</p><p></p
    corecore